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Abstract 

In this paper an iterated function system on the space of distribu- 
tion functions is built. The inverse problem is introduced and studied 
by convex optimization problems. Some applications of this method 
to approximation of distribution functions and to estimation theory 
are given. 

keywords : iterated function systems, optimization, distribution function 
estimation. 

MSC : 62E17, 62H10, 37H 



1 Introduction 

The Iterated Function Systems (IFSs) were born in mid eighties (Hutchinson 
1981, Barnsley and Demko, 1985) as applications of the theory of discrete 
dynamical systems and as useful tools to build fractals and other similar 
sets. Some possible applications of IFSs can be found in image processing 
theory (Forte and Vrscay, 1994), in the theory of stochastic growth models 
(Montrucchio and Privileggi, 1999) and in the theory of random dynamical 
systems (Arnold and Crauel 1992, Elton and Piccioni 1992, Kwiecinska and 
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Slomczynski, 2000). The fundamental result (Barnsley and Demko, 1985) on 
which the IFS method is based is Banach theorem. 

In practical applications a crucial problem is the so-called inverse problem. 
This can be formulated as follows: given / in some metric space (S, d), find 
a contraction T : S — > S that admits a unique fixed point / G S such that 
d(f, /) is small enough. In fact if one is able to solve the inverse problem 
with arbitrary precision, it is possible to identify / with the operator T which 
has it as fixed point. 

The paper is organized as follows: Section || is devoted to introduce a 
contractive operator T on the space of distribution functions while, in Section 
H, the inverse problem for T is studied in details. Section f| is divided into 
two parts: in the first some examples of inverse problems are analyzed and 
explicit solutions are given. In the second one, we introduce an estimator of 
the unknown distribution function based on IFSs. 

2 A contraction on the space of distribution 
functions 

Let us denote by the space of distribution functions F on [0, 1] by J-"([0, 1]) 
and by B([0, 1]) the space of bounded functions on [0,1]. Let us further 
define, for F,G G £>([0, 1]), d sup (F,G) = sup ze j 01 ] \F(x) — G(x)\. So that 
(JF([0, 1]), d sup ) is a metric space. 

Lemma 2.1. The space (.F([0, 1], d sup ) is a complete metric space. 

Proof. Let F n be a Cauchy's sequence in J-"([0, 1]). Then F n converges to F 
in (B([0, 1]), d sup ). Furthermore it is true that F(0) = lim F n (0) = and 

n— >+oo 

F(l) = lim F n (l) = 1 and that if x± > £2 then: 

n— >+oo 

F(xi) = lim F n ( Xl ) > lim F n (x 2 ) = F{x 2 ). 

n— »+oo n— >+oo 

To prove the right continuity of F we use the uniform convergence of F n to 
F, obtaining: 

lim F(x) = lim lim F n (x) = lim lim F n (x) = F(a). 

x^a+ x^a+ n— »+oo n— ++00 x—*a+ 

□ 
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On (^([0, 1], d sup ) we define an operator in the following way: 

i-l i-l 

TF{x) = Pl F{w- 1 {x)) + Y,Pj + J2 6 i> xeWiiKbt)), (1) 

j=i i=i 

where Fef([0, l]),fceNis fixed and: 

i) Wi : [a,i,bi) -> [c*, = ^([a*, ft*)), z = 1, . . . , k-1, w k : [a k ,b k ] -> [c k ,d k ], 
with ai = Ci = and b k = d k = 1; 

iii) Wj, i = 1 . . . k, are increasing and continuous; 

h) LWhA)) = [o,i); 

fc fc-i 

iii) Pi > o, % = 1, . . . , fc, 5j > 0, i = 1 . . . - 1, J2Pi + J2 $i = i; 

i=l i=l 

iv) if z 7^ j then Wj([aj, fl it)j([aj, 6j)) = 0. 

A similar approach has been discussed in La Torre and Rocca (1999) but 
here a more general operator is defined. In the following we will think that 
the maps Wi and the parameters 5j are fixed while the parameters Pi have 
to be chosen. To put in evidence the dependence of the operator T on the 
vector p = (pi, . . . ,p k ) we will write T p instead of T. In many pratical cases 
Wi are affine maps. In Corollary |2J] the hypothesis iii) will be weakened to 
allow more general functionals. 

Theorem 2.1. T p is an operator from .F([0, 1]) to itself. 

Proof. It is trivial that T p F(0) = and T p F(l) = 1. Furthermore if x\ > x 2 , 
without loss of generality, we will consider the two cases: 

i) 21,22 E Wi([a,i,bi))] 

ii) x\ E w i+1 ([a i+1 ,b i+1 )) and x 2 E ^([a*, &»))■ 
In case i), recalling that Wi are increasing maps, we have: 

i-l i-l 

T p F( Xl ) = Pi F(wr\ Xl ))+J2Pi + Y, S i 

3=1 3=1 
i-l i-l 

> Pl F(wr\x 2 )) + Y,Pj + J2 6 J 

3=1 3=1 

= T p F(x 2 ) 
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In case ii) we obtain: 

T p F{x x ) - T p F{x 2 ) =Pi + Si-! + p i+1 F(w^ 1 (xi)) - p i F(w^ 1 (x 2 )) = 

= Pi (l - F(wr 1 (x 2 )))+p i+1 F(w^ 1 ( Xl )) + 5^ > 

since pi > 0, 5i > and < f(y) < 1. Finally, one can prove without 
difficulties the right continuity of T p f. □ 

The following corollary to the previous result will be useful for the applica- 
tions in Section [|. 

Corollary 2.1. Suppose that Wi : [a,, 6j) — > [cij,&j) ; Wi(x) = x, Pi = p, 
Si > -p, i = 1, . . . , k. Then T p : .F([0, 1]) -> .F([0, 1]). 

Proof. Looking at the proof of previous theorem one sees that it is only 
necessary to prove that T p is a non decreasing function. Case i) is analogous 
whistl for case ii), chosing X\ > x 2 we have: 

T p F{ Xl ) - T p F{x 2 ) = p(l - F(x 2 )) + pF(xi) + 5, 

= p(F(x 1 )-F(x 2 ))+p + <5 l >0 

□ 

Theorem 2.2. 7/c = max p, < 1 ; £/ien T p is a contraction on (^([O, 1]), d sup ) 

i=l,...,k 

with contractivity constant c. 

Proof. Let F,G G (.F([0, 1]), d sup ) and let it be x G iUi([ai, fej)). We have 

|T pJ F(x)-T p G(z)| <p i |F(w i - 1 (x))-GK- 1 (x))| < cd mp (F,G) . 

This implies d^TpFjTpG) < cd 00 (F, G). □ 

The following theorem states that small perturbations of the parameters pj 
produce small variations on the fixed point of the operator. 

Theorem 2.3. Let p, p* G R k such that T p Fi = F x and T P *F 2 = F 2 . Then 

k 

dUFx,F 2 )<^—Y^\ Pj -p*\ 
where c is the contractivity constant ofT p . 
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Proof. In fact we have 
doo(Fi,F 2 ) = d 00 {T p F 1 ,T p F 2 ) 



= max 

i=l,... ,k 



i-1 



i-1 



Pi 



F.iw-^x)) + J^Pj ~ P*F 2 (w-\x)) + J>* 
< \Pi-Pi\+cd 00 (F 1 ,F 2 ), 



i=i 



since 



p l F 1 {w-\x)) + J> - P*F 2 {wr\ x )) + J^P* 



- ^ ~Pj\ + \Pi F ^ W i \ X ))~ Pi F 2(Wi 1 (x) 

3=1 

+ \p i F 2 (w- 1 (x))-p*F 2 (w- 1 (x))\ 



i-1 



< fa -P*j\ + P^Fi, F 2 ) + \ Pi - p* 

3=1 

k 

<cd 00 {F 1 ,F 2 ) + Y,\Pj-p)\- 



3=1 



□ 



3 The inverse problem as a convex constrained 
optimization problem 

Choose F e (.F([0, 1]), d sup ). The aim of solving the inverse problem is to find 
a contractive map T : .F([0, 1]) — > ^([0, 1]) which has a fixed point "near" 
to F. In fact if it is possible to solve the inverse problem with an arbitrary 
precision one can identify the operator T with its fixed point. With a fixed 
system of maps Wi and parameters 5j, the inverse problem can be solved, if 
it is possible, by using the parameters pi. These have to be choose in the 
following convex set: 

( k k-l ~\ 

C = \ p e R k : Pi > 0, % = 1 , . . . , k, Pi = 1 - Yl 5i \ ' 



i=i 



i=i 
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We have the following result. 

Theorem 3.1. Choose e > and p G C such that pi ■ pj > for some i ^ j. 
If d sup (T p F, F) < e, then: 

d sup (F, F) < 



1 — C 

where F is the fixed point ofT p on .F([0, 1]) and c = max pi is the contrac- 
tivity constant ofT p . 

Proof. The assumptions imply c < 1. So we have: 

dsupi^F-i -f 1 ) — dgu-pi^F, T p F^) -\- d sup (T p F, T P F) ^ e -(- c d sup (F, F^j 

and so we get the thesis. □ 

If we wish to find an approximated solution of the inverse problem, we 
have to solve the following constrained optimization problem: 

(P) min d sup (T p F,F) 

p&c 

It is clear that the ideal solution of (P) consists of finding a p* G C such 
that d sup (T p *F, F) = 0. In fact this means that, given a distribution function 
F, we have found a contractive map T p which has exactly F as fixed point. 
Indeed the use of Theorem 3J. gives us only an approximation of F . This can 
be improved, once fixed the maps Wi, increasing the number of parameters 
Pi- 

The following result proves the convexity of the function D(p) = d sup (T p F, F), 

p e 

Theorem 3.2. The function D(p) : M fe — >• R is convex. 

Proof. If we choose pi,P2 £ and A G [0, 1] then: 

D(Xp 1 + (1 - X)p 2 ) = sup |T Api+(1 _ A)p2 F(x) - F(x)\ < 

xe[o,l] 

X sup \T pi F(x)-F(x)\ + (l-X) sup \T P2 F(x)-F(x)\ = XD( Pl ) + (l-X)D(p 2 ). 

xG[0,l] X6[0,l] 

□ 

Hence for solving problem (P) one can recall classical results about convex 
programming problems (see for instance Rockafellar and Wets, 1998). A 
necessary and sufficient condition for p* G C to be a solution of (P) can be 
given by Kuhn- Tucker conditions. 
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4 Inverse problem for distribution functions 
and applications 

In this section we consider different problems. We show that for a particu- 
lar class of distribution functions the inverse problem can be solved exactly 
without solving any optimization problem. Then we discuss two ways of con- 
struct IFS to approximate a distribution function F with a finite number of 
parameters pi and maps Wi. 

As is usual in statistical applications, given a sample of n independent and 
identically distributed observations, (x\, x 2 , ■ ■ ■ , x n ), drawn from an unknown 
distribution function F, one can easily contruct the empirical distribution 
function (e.d.f.) F n that reads 



1 71 

F n (x) = -^X(-oo,x](zt), xeR, 



n 

i=l 

where xa is the indicator function of the set A. Asymptotical properties 
of optimality of F n as estimator of the unknown F when n goes to infinity 
are well known and studied (Millar 1979 and 1983). This function has an 
IFS representation that is exact and can be found without solving any opti- 
mization problem. We assume that the the Xi in the sample are all different 
(this assumption is natural if F is a continuous distribution function). Let 
Wi(x) : [xj_i,Xj) — > [xi-i,Xi), when i — l...n and Wi(x) : [0, Xi) — > [0,a;i), 
w n+ i(x) : avi_i] — > [x n , x n+ i] ,with xq = and x n+ i = 1. Assume also that 
every map is of the form Wi(x) = x. If we choose pi = -,i = 2...n+l, 
Pi = and 

r 71—1 1 
01 = 5—, Oi = 

then the following representation holds: 



i = 1 

T p F n {x) = { \F n {x) t = 2 

iF n (x) + ^i + ^ti, i = 3,...,n + l. 

when ). Furthermore, from Corollary ^TT] we are guaranteed that 

lim d^T^u, F n )^0, V«G ^[0, 1]. 
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Note that from the point of view of applications, constructing the e.d.f. or 
iterate the IFS with the given maps is exactly equivalent if one start, for 
example, with a uniform distribution on [0,1] in the first iteration. So this 
is just a case when we can present an IFS that gives exact result for this 
particular class of distribution functions. 

What follows, on the contrary, is more attractive from the point of view of 
applications. Suppose that one knows the distribution function F and wants 
to construct the IFS which has F as fixed point. In general one has to provide 
an infinite set of afline maps {wi,i G N} and solve an extremal problem to 
find the corresponding sequence of weigths pi, i 6 N. This problem has not 
a general solution but at the same time the solution in terms of a finite, 
possibily few, number of maps and weigths is crucial in applications like 
image compression and trasmission. 

The idea is the following: one can think at n points (xi,X2, . . . ,x n ) as 
they were drawn from the distribution function F, and use the same maps 
Wi of the e.d.f. F n , then instead of using the Pi equal to 1/n one solve the 
extremal problem as it is usual in IFS application. The corresponding IFS 
should have a fixed point that is a "good" approximation on F . So it is 
sufficient to store the simulated data and the weights instead of F itself. 

We take the functional T P F with the particular choice of Si = 0. This 
choice is in principle not necessary but simplifies the solution of the problem. 
We simulated n i.i.d. observations from the distribution function F and we 
use the maps of the e.d.f. above. 

We now try to solve the extremal problem 

min d^T^F) 

n 

under the constrain p^ = 1, p^ > 0, % = 1, . . . , n (with some Pi > 0). The 
i=i 

optimal solution will be {pi, i — 1, . . . , n} such that d^TpF, F) = that it 
is true in at least one case: if F equals F n and pi — 1/n. Otherwise we will 
obtain some positive number. That means that, in principle, in the worst 
case we can approximate F with its empirical distribution function F n . But 
we can generally do better. So let us solve the problem: let us fixj o = and 
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x n+ i = 1, then 



d 00 (T p F,F)= sup \T p F(x) - F(x)\ 
xe[o,i\ 

= max < sup \T p F(x) — F(x) 

i=l,...,n+l [[a-..^.) 



= max < sup 

i=l,... ,n+l I [xi_i,Xi) 



i-1 



J'=l 



max 

i=l,... ,n+l 



i-1 



J2 p i ~ t 1 ~Pi) F ( X i-l) 
J'=l 



i-1 



^p.-(l-p.)F(x-) 
j'=i 



and the last line is due to the non- decreasing property of F. 

Example 4.1. Suppose that F is the distribution function of a uniform dis- 
tribution on [0,1] and suppose that we can only draw one observation from F 
(or choose a point) x 1 . The empirical distribution function Fi(x) = X{x!,i}( x ) 
is usless if we have in mind to approximate F . Let us use the second tech- 
nique: fix Wi : [0, xi) — > [0, xi) and w 2 : [x±, 1] — > [x±, 1]. We try to solve the 
above extremal problem. 



d OQ (T p F,F)= max 

i=l,... ,n+l 



i-1 



p j - -Pi) F ( x i) 



i-1 



p i - 0--Pi) F ( x i+i, 

j'=i 



max 



'1 -Pi) ' °U - (! -Pi)xi\, 



\Pi ~ (l-p 2 )xi\, \pi - (1 -p 2 ) 

= max{0,a;i(l -pi),pi(l - Xi),0}, £i G (0, 1), pi + p 2 = 1 . 

now minimazing with respect to p\ and p 2 under the constrain p\ + p 2 = 1 
one obtains simply pi — x±. The resulting functional will be 



T xi u(x) 



Xi u(x), x G [0, Xi) 

(1 — xi) u(x) + Xi, I e [xi, 1] 



and it is clear that T xi u(x) = T Xl x, for one iteration only, is closer than F 1 
to F(x) = x and the approximation is better and better as X\ — > or x\ — > 1 . 
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We propose now a more efficient method to approximate F when F is 
not to be estimated. We have already mentioned that the e.d.f. is the better 
estimator of an unknown distribution function F, so one can think to sample 
n points from F and use their values to approximate F by F n . As n — > oo, 
the statistical literature assures almost sure convergence of F n (x) to F(x) 
for every x. We also have shown the exact IFS representation of F n . But 
this method is not efficient. On the contrary, suppose that F is a continuous 
distribution function. As we know F, we can think to approximate it by 
means of continuous functions instead of simple functions like F n . Choose n 
points (xi, . . . ,x n ) and assume that xo = and x n+ \ = 1. One can costruct 
the following functional 

— — - — J + F(xi-i), x e [xi-i, Xi), 

i = 1, . . . ,n + 1. Notice that Tp is a particular case of ([!]) where Pi = 
F{xi) — F(xi-x), Si = and Wi(x) : [0, 1) — > [xi-i,Xi). This is a contraction 
and, at each iteration, Tp passes exactly through the points F(xi). It is 
almost evident that, when n increases the fixed point of the above functional 
will be "close" to F. So again, instead of sending an infinite set of weigths 
and maps, one can send n points and the values of F evaluated at these 
points. All in summary, only 2 • n informations should be sent to reconstruct 
F. 

For n small, the choice of a good grid of point is critical. So one question 
arises: how to choose the n points ? One can proceed case by case but 
as F is a distribution function one can use its properties. We propose the 
following solution: take n points (tti,w 2 , ... ,u n ) equally spaced [0,1] and 
define Xi = i = 1, . . . ,n. The points X{ are just the quantiles of 

F. In this way, it is assured that the profile of F is followed as smooth 
as possible. In fact, if two quantiles Xi and Xi+\ are relatively distant each 
other, than F is slowly increasing in the interval and viceversa. 

This method is more efficient than simply taking equally spaced points on 
[0, 1]. If this method of choosing the points is used, the the functional simply 
appears as 

T F u(x) = —u ( — — X% ~ 1 ) + , x G \xi_i,Xi), i — 1, . . . , n + 1 . 

n \Xi — Xi-x J n 

And this suggests an empirical estimator of F. If g$, % — 1, 2, . . . , k, k < n, 
are the empirical quantiles of the sample (xi,x 2 , ■ ■ ■ , x n ) of order j-, then an 
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estimator of the unknown distribution function F should be written as 



F {k) u(x) = -u ( — ) + -, x e qi), 

i = 1, . . . , k, with g = an d qu+i = 1 As n and k = k(n) go to infinity 
F(fc) converges to F. Relative efficiency of F^) with respect to F n is inves- 
tigated via simulations. The results are reported in Table |] for differently 
shaped distribution functions and sample sizes. What emerges is that F^) is 
equivalent to the e.d.f. in the sense of the sup-norm. 
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Table 1: Simultation results. Values are the arithmetic means over 39 trials. 
The functional is iterated 4 times starting with the uniform distribution on 
[9,1] as initial point. Functions are evaluated at 29 equally spaced points on 
[9,1]. The proposed estimator can be said to be almost equivalent as the best 
estimator F„. 
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